Concept Induction via Fuzzy C-Means Clustering in a High Dimensional Semantic Space

نویسندگان

  • Dawei Song
  • Guihong Cao
  • Peter Bruza
  • Raymond Lau
چکیده

Lexical semantic space models have recently been investigated to automatically derive the meaning (semantics) of information based on natural language usage. In a semantic space, a term can be considered as a concept represented geometrically as a vector, the components of which correspond to terms in a vocabulary. A primary way to perform reasoning in a semantic space is to categorize concepts in the space into a number of regions (i.e., groups). Such a process is referred to as concept induction, which can be realized by clustering objects in the space. The resulting groups can potentially form a basis for knowledge discovery and ontology construction. Conventional clustering algorithms, e.g., the K-Means method, normally produce crisp clusters, i.e., an object could be assigned to only one cluster. It is not always the case in reality. For example, a word “Reagan” may belong to both the cluster about administration of US government, and another one about the Iran-contra scandal. Therefore, a membership function is applied, which determines the degree to which an object belongs to different clusters. This chapter introduces a cognitively motivated semantic space model, namely Hyperspace Analogue to Language (HAL), and shows how a fuzzy C-Means clustering algorithm is used to concept categorization in the high dimensional semantic space. The experimental results indicate that applying fuzzy C-Means clustering over the HAL semantic space is promising in constructing semantically related groups of terms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fuzzy K-Means Clustering on a High Dimensional Semantic Space

One way of representing semantics could be via a high dimensional conceptual space constructed by certain lexical semantic space models. Concepts (words), represented as a vector of other words in the semantic space, can be categorized via clustering techniques into a number of regions reflecting different contexts. The conventional clustering algorithms, e.g., K-means method, however, normally...

متن کامل

High Performance Implementation of Fuzzy C-Means and Watershed Algorithms for MRI Segmentation

Image segmentation is one of the most common steps in digital image processing. The area many image segmentation algorithms (e.g., thresholding, edge detection, and region growing) employed for classifying a digital image into different segments. In this connection, finding a suitable algorithm for medical image segmentation is a challenging task due to mainly the noise, low contrast, and steep...

متن کامل

High Performance Implementation of Fuzzy C-Means and Watershed Algorithms for MRI Segmentation

Image segmentation is one of the most common steps in digital image processing. The area many image segmentation algorithms (e.g., thresholding, edge detection, and region growing) employed for classifying a digital image into different segments. In this connection, finding a suitable algorithm for medical image segmentation is a challenging task due to mainly the noise, low contrast, and steep...

متن کامل

High-Dimensional Unsupervised Active Learning Method

In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...

متن کامل

Bilateral Weighted Fuzzy C-Means Clustering

Nowadays, the Fuzzy C-Means method has become one of the most popular clustering methods based on minimization of a criterion function. However, the performance of this clustering algorithm may be significantly degraded in the presence of noise. This paper presents a robust clustering algorithm called Bilateral Weighted Fuzzy CMeans (BWFCM). We used a new objective function that uses some k...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006